Introduction¶

In this project, you will build a neural network of your own design to evaluate the CIFAR-10 dataset. Our target accuracy is 70%, but any accuracy over 50% is a great start. Some of the benchmark results on CIFAR-10 include:

78.9% Accuracy | Deep Belief Networks; Krizhevsky, 2010

90.6% Accuracy | Maxout Networks; Goodfellow et al., 2013

96.0% Accuracy | Wide Residual Networks; Zagoruyko et al., 2016

99.0% Accuracy | GPipe; Huang et al., 2018

98.5% Accuracy | Rethinking Recurrent Neural Networks and other Improvements for ImageClassification; Nguyen et al., 2020

Research with this dataset is ongoing. Notably, many of these networks are quite large and quite expensive to train.

Imports¶

In [1]:
## This cell contains the essential imports you will need – DO NOT CHANGE THE CONTENTS! ##
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
!pip install plotly
import plotly.graph_objects as go
Requirement already satisfied: plotly in c:\users\2g5318897\appdata\local\miniconda3\lib\site-packages (5.15.0)
Requirement already satisfied: tenacity>=6.2.0 in c:\users\2g5318897\appdata\local\miniconda3\lib\site-packages (from plotly) (8.2.2)
Requirement already satisfied: packaging in c:\users\2g5318897\appdata\local\miniconda3\lib\site-packages (from plotly) (23.0)

Load the Dataset¶

Specify your transforms as a list first. The transforms module is already loaded as transforms.

CIFAR-10 is fortunately included in the torchvision module. Then, you can create your dataset using the CIFAR10 object from torchvision.datasets (the documentation is available here). Make sure to specify download=True!

Once your dataset is created, you'll also need to define a DataLoader from the torch.utils.data module for both the train and the test set.

In [2]:
# Define transforms
data_dir = 'Cat_Dog_data'

# TODO: Define transforms for the training data and testing data


# Pass transforms in here, then run the next cell to see how the transforms look
train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) 

test_transforms = transforms.Compose([transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])


# Create training set and define training dataloader
train_data = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=train_transforms)
trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True, num_workers=2)


# Create test set and define test dataloader
test_data = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=test_transforms)
testloader = torch.utils.data.DataLoader(test_data, batch_size=64, shuffle=False, num_workers=2)


# The 10 classes in the dataset
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Files already downloaded and verified
Files already downloaded and verified

Explore the Dataset¶

Using matplotlib, numpy, and torch, explore the dimensions of your data.

You can view images using the show5 function defined below – it takes a data loader as an argument. Remember that normalized images will look really weird to you! You may want to try changing your transforms to view images. Typically using no transforms other than toTensor() works well for viewing – but not as well for training your network. If show5 doesn't work, go back and check your code for creating your data loaders and your training/test sets.

In [14]:
# somehow the original function makes my kernel crach systemtically, so I re-wrote it using PIL instead.
def show5(img_loader):
    dataiter = iter(img_loader)
    
    batch = next(dataiter)
    labels = batch[1][0:5]
    images = batch[0][0:5]
    for i in range(5):
        print(classes[labels[i]])
        
        image = images[i].numpy()
        image = image.transpose((1, 2, 0))  # Transpose the image dimensions
        image = (image * 0.5) + 0.5  # Remove normalization
        image = (image * 255).astype(np.uint8)  # Convert to uint8
        pil_image = Image.fromarray(image)
        pil_image.show()

# def show5(img_loader):
#     dataiter = iter(img_loader)
    
#     batch = next(dataiter)
#     labels = batch[1][0:5]
#     images = batch[0][0:5]
#     for i in range(5):
#         print(classes[labels[i]])
    
#         image = images[i].numpy()
#         plt.imshow(image.T)
#         plt.show()
In [15]:
# Explore data
show5(trainloader)
show5(testloader)
cat
truck
ship
horse
frog
cat
ship
ship
plane
frog

Build your Neural Network¶

Using the layers in torch.nn (which has been imported as nn) and the torch.nn.functional module (imported as F), construct a neural network based on the parameters of the dataset. Feel free to construct a model of any architecture – feedforward, convolutional, or even something more advanced!

In [8]:
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)
        self.relu1 = nn.ReLU()
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
        self.relu2 = nn.ReLU()
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.fc1 = nn.Linear(32  *8*  8, 128)
        self.relu3 = nn.ReLU()
        self.fc2 = nn.Linear(128, 10)
        
    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.pool1(x)
        
        x = self.conv2(x)
        x = self.relu2(x)
        x = self.pool2(x)
        
        x = x.view(x.size(0), -1)
        
        x = self.fc1(x)
        x = self.relu3(x)
        x = self.fc2(x)
                
        return x

Specify a loss function and an optimizer, and instantiate the model.

If you use a less common loss function, please note why you chose that loss function in a comment.

In [10]:
model = SimpleCNN()
criterion = nn.CrossEntropyLoss() #combines the softmax activation and the negative log-likelihood loss into a single step
optimizer = optim.SGD(model.parameters(), lr=0.003, momentum=0.9)

Running your Neural Network¶

Use whatever method you like to train your neural network, and ensure you record the average loss at each epoch. Don't forget to use torch.device() and the .to() method for both your model and your data if you are using GPU!

If you want to print your loss during each epoch, you can use the enumerate function and print the loss after a set number of batches. 250 batches works well for most people!

In [11]:
epochs = 45

train_losses, test_losses = [], []
for e in range(epochs):
    running_loss = 0
    for images, labels in trainloader:
        
        optimizer.zero_grad()
        
        log_ps = model(images)
        loss = criterion(log_ps, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        
    else:
        test_loss = 0
        accuracy = 0
        
        # Turn off gradients for validation, saves memory and computations
        with torch.no_grad():
            model.eval()
            for images, labels in testloader:
                log_ps = model(images)
                test_loss += criterion(log_ps, labels)
                
                ps = torch.exp(log_ps)
                top_p, top_class = ps.topk(1, dim=1)
                equals = top_class == labels.view(*top_class.shape)
                accuracy += torch.mean(equals.type(torch.FloatTensor))
        
        model.train()
        
        train_losses.append(running_loss/len(trainloader))
        test_losses.append(test_loss/len(testloader))

        print("Epoch: {}/{}.. ".format(e+1, epochs),
              "Training Loss: {:.3f}.. ".format(train_losses[-1]),
              "Test Loss: {:.3f}.. ".format(test_losses[-1]),
              "Test Accuracy: {:.3f}".format(accuracy/len(testloader)))
Epoch: 1/45..  Training Loss: 1.897..  Test Loss: 1.588..  Test Accuracy: 0.423
Epoch: 2/45..  Training Loss: 1.515..  Test Loss: 1.400..  Test Accuracy: 0.493
Epoch: 3/45..  Training Loss: 1.387..  Test Loss: 1.297..  Test Accuracy: 0.538
Epoch: 4/45..  Training Loss: 1.288..  Test Loss: 1.197..  Test Accuracy: 0.580
Epoch: 5/45..  Training Loss: 1.214..  Test Loss: 1.111..  Test Accuracy: 0.611
Epoch: 6/45..  Training Loss: 1.160..  Test Loss: 1.091..  Test Accuracy: 0.617
Epoch: 7/45..  Training Loss: 1.121..  Test Loss: 1.020..  Test Accuracy: 0.646
Epoch: 8/45..  Training Loss: 1.078..  Test Loss: 1.010..  Test Accuracy: 0.647
Epoch: 9/45..  Training Loss: 1.043..  Test Loss: 0.993..  Test Accuracy: 0.653
Epoch: 10/45..  Training Loss: 1.009..  Test Loss: 0.978..  Test Accuracy: 0.663
Epoch: 11/45..  Training Loss: 0.987..  Test Loss: 0.927..  Test Accuracy: 0.679
Epoch: 12/45..  Training Loss: 0.968..  Test Loss: 0.932..  Test Accuracy: 0.677
Epoch: 13/45..  Training Loss: 0.941..  Test Loss: 0.890..  Test Accuracy: 0.688
Epoch: 14/45..  Training Loss: 0.922..  Test Loss: 0.896..  Test Accuracy: 0.689
Epoch: 15/45..  Training Loss: 0.902..  Test Loss: 0.879..  Test Accuracy: 0.691
Epoch: 16/45..  Training Loss: 0.889..  Test Loss: 0.877..  Test Accuracy: 0.692
Epoch: 17/45..  Training Loss: 0.867..  Test Loss: 0.861..  Test Accuracy: 0.696
Epoch: 18/45..  Training Loss: 0.857..  Test Loss: 0.822..  Test Accuracy: 0.712
Epoch: 19/45..  Training Loss: 0.839..  Test Loss: 0.824..  Test Accuracy: 0.712
Epoch: 20/45..  Training Loss: 0.836..  Test Loss: 0.828..  Test Accuracy: 0.710
Epoch: 21/45..  Training Loss: 0.822..  Test Loss: 0.809..  Test Accuracy: 0.717
Epoch: 22/45..  Training Loss: 0.804..  Test Loss: 0.827..  Test Accuracy: 0.710
Epoch: 23/45..  Training Loss: 0.799..  Test Loss: 0.790..  Test Accuracy: 0.721
Epoch: 24/45..  Training Loss: 0.788..  Test Loss: 0.818..  Test Accuracy: 0.713
Epoch: 25/45..  Training Loss: 0.777..  Test Loss: 0.786..  Test Accuracy: 0.727
Epoch: 26/45..  Training Loss: 0.767..  Test Loss: 0.808..  Test Accuracy: 0.716
Epoch: 27/45..  Training Loss: 0.758..  Test Loss: 0.772..  Test Accuracy: 0.731
Epoch: 28/45..  Training Loss: 0.749..  Test Loss: 0.762..  Test Accuracy: 0.735
Epoch: 29/45..  Training Loss: 0.740..  Test Loss: 0.779..  Test Accuracy: 0.729
Epoch: 30/45..  Training Loss: 0.738..  Test Loss: 0.759..  Test Accuracy: 0.736
Epoch: 31/45..  Training Loss: 0.724..  Test Loss: 0.768..  Test Accuracy: 0.736
Epoch: 32/45..  Training Loss: 0.719..  Test Loss: 0.763..  Test Accuracy: 0.738
Epoch: 33/45..  Training Loss: 0.710..  Test Loss: 0.785..  Test Accuracy: 0.730
Epoch: 34/45..  Training Loss: 0.702..  Test Loss: 0.762..  Test Accuracy: 0.732
Epoch: 35/45..  Training Loss: 0.704..  Test Loss: 0.768..  Test Accuracy: 0.737
Epoch: 36/45..  Training Loss: 0.692..  Test Loss: 0.778..  Test Accuracy: 0.734
Epoch: 37/45..  Training Loss: 0.683..  Test Loss: 0.759..  Test Accuracy: 0.735
Epoch: 38/45..  Training Loss: 0.687..  Test Loss: 0.755..  Test Accuracy: 0.741
Epoch: 39/45..  Training Loss: 0.675..  Test Loss: 0.742..  Test Accuracy: 0.746
Epoch: 40/45..  Training Loss: 0.670..  Test Loss: 0.745..  Test Accuracy: 0.742
Epoch: 41/45..  Training Loss: 0.666..  Test Loss: 0.748..  Test Accuracy: 0.742
Epoch: 42/45..  Training Loss: 0.661..  Test Loss: 0.754..  Test Accuracy: 0.746
Epoch: 43/45..  Training Loss: 0.655..  Test Loss: 0.751..  Test Accuracy: 0.746
Epoch: 44/45..  Training Loss: 0.648..  Test Loss: 0.748..  Test Accuracy: 0.743
Epoch: 45/45..  Training Loss: 0.643..  Test Loss: 0.753..  Test Accuracy: 0.747

Plot the training loss (and validation loss/accuracy, if recorded).

In [12]:
# somehow the original code (commented below) makes my kernel crach systemtically, so I re-wrote it using plotly instead.

# plt.plot(train_losses, label='Training loss')
# plt.plot(test_losses, label='Validation loss')
# plt.legend(frameon=False)

# Create the line plot
fig = go.Figure()
fig.add_trace(go.Scatter(x=list(range(len(train_losses))), y=train_losses, name='Training loss'))
fig.add_trace(go.Scatter(x=list(range(len(test_losses))), y=test_losses, name='Validation loss'))

# Update layout
fig.update_layout(showlegend=True)

# Show the plot
fig.show()

Testing your model¶

Using the previously created DataLoader for the test set, compute the percentage of correct predictions using the highest probability prediction.

If your accuracy is over 70%, great work! This is a hard task to exceed 70% on.

If your accuracy is under 45%, you'll need to make improvements. Go back and check your model architecture, loss function, and optimizer to make sure they're appropriate for an image classification task.

In [ ]:
#see above, accuracy >70%.

Saving your model¶

Using torch.save, save your model for future loading.

In [13]:
checkpoint = {'state_dict': model.state_dict()}

torch.save(checkpoint, 'checkpoint.pth')

Make a Recommendation¶

My model achieved over 74% accuracy, much lower than the state-of-the-art models, but higher than the Detectocorp’s algorithm. Given that, and the fact that my model is relatively simple and was trained pretty quickly, I would definitely recommend building the model.
My model has a total of 6 layers and 3 activation functions. Suggestions of paths to explore:

  • Increase the depth of the network: more convolutional layers can be added to capture more complex features. Risk: overfitting.

  • Adjust the kernel size and stride: Smaller kernel sizes can capture finer details, while larger kernel sizes can capture more global features.

  • Try different activation functions: LeakyReLU, ELU, SELU, etc.

  • Add regularization techniques: Regularization techniques like dropout can help prevent overfitting and improve generalization.

  • Increase the number of filters: You can increase the number of filters in the convolutional layers to capture more diverse features. Risk: computational cost and potential overfitting.

  • Adjust the learning rate and optimizer

In [ ]: